-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add configuration to create v3 ivf_pq indices via python #2941
base: main
Are you sure you want to change the base?
Conversation
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2941 +/- ##
==========================================
+ Coverage 78.95% 78.99% +0.04%
==========================================
Files 238 238
Lines 75577 75617 +40
Branches 75577 75617 +40
==========================================
+ Hits 59674 59737 +63
+ Misses 12874 12847 -27
- Partials 3029 3033 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
||
/// Use V3 Index builder. | ||
/// Only used by IVF_PQ index since IVF_PQ still creates old index format by default. | ||
pub force_use_new_index_format: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use an Enum to specify which version to use? We will likely to have new index format down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought of doing that and just pass the LanceVersion, however, I was not sure how does it tie to the index builder. Can change it to the lance version if that makes it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I would prefer a index_file_version
parameter over a use_new_index_format
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are V1
and V3
if you are going to create that enum
V2 has been removed, no index is in V2 format now.
@BubbleCal @westonpace Wanted to see if there are any other review comments that I need to address apart from the enum change that @eddyxu suggested |
#[tokio::test] | ||
async fn test_create_ivf_pq_dot() { | ||
run_ivf_pq_dot_test(false).await; | ||
} | ||
|
||
#[tokio::test] | ||
async fn test_create_ivf_pq_v3_dot() { | ||
run_ivf_pq_dot_test(true).await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making separate test functions, could we parametrize the test functions with rstest?
|
||
/// Use V3 Index builder. | ||
/// Only used by IVF_PQ index since IVF_PQ still creates old index format by default. | ||
pub force_use_new_index_format: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I would prefer a index_file_version
parameter over a use_new_index_format
flag.
I just added |
PR adds the ability to create v3 ivf_pq indices from python by passing in a bool parameter via the VectorIndexParams python. Right now it just sets the value to true for IVF_PQ but it can be used for other indices as well. Also, it doesn't change the JNI side for IVF_PQ for now, but can be added as a follow-up PR.
Added a bunch of tests in python and rust to test this.